AITopics | optimal hyperparameter

Collaborating Authors

optimal hyperparameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

23ee05bf1f4ade71c0f8f5ca722df601-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-25-2026, 02:24:23 GMT

artificial intelligence, engcn, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0be50b4590f1c5fdf4c8feddd63c4f67-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 16:50:03 GMT

In Figure 1 we demonstrate the common neighbor (CN) distribution among positive and negative test samples for ogbl-collab, ogbl-ppa, and ogbl-citation2. These results demonstrate that a vast majority of negative samples have no CNs. Since CNs is a typically good heuristic, this makes it easy to identify most negative samples. We further present the CN distribution of Cora, Citeseer, Pubmed, and ogbl-ddi in Figure 3. The CN distribution of Cora, Citeseer, and Pubmed are consistent with our previous observations on the OGB datasets in Figure 1.

artificial intelligence, machine learning, negative sample, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

34d3cf97696022b179171e5abda42c0b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 00:28:25 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Optimizing Optimizers for Fast Gradient-Based Learning

Lee, Jaerin, Lee, Kyoung Mu

arXiv.org Machine LearningDec-9-2025

We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers as maximizing the instantaneous decrease in loss. By treating an optimizer as a function that translates loss gradient signals into parameter motions, the problem reduces to a family of convex optimization problems over the space of optimizers. Solving these problems under various constraints not only recovers a wide range of popular optimizers as closed-form solutions, but also produces the optimal hyperparameters of these optimizers with respect to the problems at hand. This enables a systematic approach to design optimizers and tune their hyperparameters according to the gradient statistics that are collected during the training process. Furthermore, this optimization of optimization can be performed dynamically during training. Just as optimizers train their models by feeding them parameter velocities θ, models can also fit the optimizers to the underlying tasks by feeding gradients g. We are interested in the problem of designing optimiz-ers that maximize the utility of gradient-based learning for a given task. The process of learning manifests as the parameter motion θ driven by the gradient force g applied at each step t. Physics requires a constitutive law that relates kinematic motion to its motive force. In gradient-based learning, optimizers take that role. We can represent an optimizer as a positive semidefinite operator Q 0 that linearly translates the gradients into the parameter updates, θ = Q g. (1) Later sections will reveal that many existing optimizers fall into this category. Q g. (2) Adhering to the greedy paradigm, we turn our original problem of maximizing the utility of learning into a different optimization problem that maximizes this loss drop with respect to the optimizer Q: maximize Problem P1 reveals two design options that bound this maximum: (1) the trust region implied by the feasible set Q Q, and (2) the gradient distribution under the expectation E. Our main focus is on how these two factors determine the optimal optimizer Q Optimizers and their hyperparameters can be dynamically tuned or even be replaced by better ones according to the intermediate probes from the gradients in the middle of training. By reverse engineering commonly used optimizers, we draw the landscape of optimizers that have driven the success of machine learning (Robbins & Monro, 1951; Kingma & Ba, 2015; Loshchilov & Hutter, 2019; Gupta et al., 2018; Martens & Grosse, 2015) into a single picture. This lets us better use the well-studied optimizers in practice and also suggest extensions to them. Note that Σ is a symmetric and positive semidefinite (PSD) matrix of shape d d.

hyperparameter, optimal optimizer, optimizer, (14 more...)

arXiv.org Machine Learning

2512.0637

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

34d3cf97696022b179171e5abda42c0b-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 23:02:52 GMT

arxiv preprint arxiv, dataset, hyperparameter, (13 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

36ac8e558ac7690b6f44e2cb5ef93322-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 16:12:08 GMT

artificial intelligence, bioinformatics, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.70)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ac56f8fe9eea3e4a365f29f0f1957c55-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 17:42:05 GMT

artificial intelligence, el2n score, machine learning, (18 more...)

Neural Information Processing Systems

Industry: Transportation (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

BO-SA-PINNs: Self-adaptive physics-informed neural networks based on Bayesian optimization for automatically designing PDE solvers

Zhang, Rui, Li, Liang, Lanteri, Stéphane, Kang, Hao, Li, Jiaqi

arXiv.org Artificial IntelligenceApr-15-2025

Physics-informed neural networks (PINNs) is becoming a popular alternative method for solving partial differential equations (PDEs). However, they require dedicated manual modifications to the hyperparameters of the network, the sampling methods and loss function weights for different PDEs, which reduces the efficiency of the solvers. In this paper, we pro- pose a general multi-stage framework, i.e. BO-SA-PINNs to alleviate this issue. In the first stage, Bayesian optimization (BO) is used to select hyperparameters for the training process, and based on the results of the pre-training, the network architecture, learning rate, sampling points distribution and loss function weights suitable for the PDEs are automatically determined. The proposed hyperparameters search space based on experimental results can enhance the efficiency of BO in identifying optimal hyperparameters. After selecting the appropriate hyperparameters, we incorporate a global self-adaptive (SA) mechanism the second stage. Using the pre-trained model and loss information in the second-stage training, the exponential moving average (EMA) method is employed to optimize the loss function weights, and residual-based adaptive refinement with distribution (RAR-D) is used to optimize the sampling points distribution. In the third stage, L-BFGS is used for stable training. In addition, we introduce a new activation function that enables BO-SA-PINNs to achieve higher accuracy. In numerical experiments, we conduct comparative and ablation experiments to verify the performance of the model on Helmholtz, Maxwell, Burgers and high-dimensional Poisson equations. The comparative experiment results show that our model can achieve higher accuracy and fewer iterations in test cases, and the ablation experiments demonstrate the positive impact of every improvement.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.09804

Country: Europe (0.46)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Precise Scaling Laws for Video Diffusion Transformers

Yin, Yuanyang, Zhao, Yaqi, Zheng, Mingwu, Lin, Ke, Ou, Jiarong, Chen, Rui, Huang, Victor Shea-Jay, Wang, Jiahao, Tao, Xin, Wan, Pengfei, Zhang, Di, Yin, Baoqun, Zhang, Wentao, Gai, Kun

arXiv.org Artificial IntelligenceDec-31-2024

Achieving optimal performance of video diffusion transformers within given data and compute budget is crucial due to their high training costs. This necessitates precisely determining the optimal model size and training hyperparameters before large-scale training. While scaling laws are employed in language models to predict performance, their existence and accurate derivation in visual generation models remain underexplored. In this paper, we systematically analyze scaling laws for video diffusion transformers and confirm their presence. Moreover, we discover that, unlike language models, video diffusion models are more sensitive to learning rate and batch size, two hyperparameters often not precisely modeled. To address this, we propose a new scaling law that predicts optimal hyperparameters for any model size and compute budget. Under these optimal settings, we achieve comparable performance and reduce inference costs by 40.1% compared to conventional scaling methods, within a compute budget of 1e10 TFlops. Furthermore, we establish a more generalized and precise relationship among validation loss, any model size, and compute budget. This enables performance prediction for non-optimal model sizes, which may also be appealed under practical inference cost constraints, achieving a better trade-off.

hyperparameter, model size, optimal hyperparameter, (14 more...)

arXiv.org Artificial Intelligence

2411.1747

Country: